Skip to content

Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)#3135

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-06-09
Jun 10, 2026
Merged

Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)#3135
brendancol merged 4 commits into
mainfrom
deep-sweep-metadata-geotiff-2026-06-09

Conversation

@brendancol

@brendancol brendancol commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

to_geotiff(da, 'out.vrt', tile_size=N) on a non-georeferenced array spanning more than one tile wrote a corrupt index. Placement was derived from each tile's GeoTransform, and non-georef tiles all carry the default identity transform, so every DstRect landed at the origin and rasterXSize/rasterYSize shrank to a single tile. Reading the VRT back silently returned one tile's data (a 24x32 input came back as 16x16). Gap left by #2971, whose tests only covered a single non-georef source.

Proposed Changes

  • _write_vrt_tiled records each tile's pixel offsets and passes them through _build_vrt to write_vrt via an internal dst_offsets kwarg when the input has no georeferencing. The dask-backed and plain-ndarray .vrt writes go through the same loop, so both are covered.
  • write_vrt refuses more than one all-non-georef source without explicit placement (identity transforms cannot place the tiles), and rejects dst_offsets alongside georeferenced sources, which place via their GeoTransform. The dst_offsets docstring states that offsets are not checked for overlap or full coverage; direct callers own their layout.
  • Georeferenced placement math is unchanged; the per-source offset computation moved out of the XML loop but produces the same values.
  • Merged origin/main to bring the branch up to date; the only conflict was in .claude/sweep-metadata-state.csv (HEAD geotiff row kept, new mcda row from main included).

Backend coverage: the writer fix is backend-independent; the round-trip test reads the repaired VRT on numpy, cupy, dask+numpy, and dask+cupy.

Test plan:

  • 18 new tests in tests/vrt/test_non_georef_placement_3116.py: 4-backend round trip, dask-backed write, plain-ndarray write, single-tile no-op, XML DstRect assertions, georef placement regression guard, and the write_vrt error contract (ambiguous placement, georef + offsets, length mismatch, malformed pairs).
  • Full VRT suite: 520 passed.
  • Write + round-trip + signature suites: 1292 passed, 1 skipped.
  • flake8 / isort clean on touched files.

Also carries the metadata-sweep state CSV row for the geotiff module (sweep bookkeeping).

to_geotiff(da, 'out.vrt', tile_size=N) on a non-georeferenced array
spanning more than one tile wrote a corrupt index: placement came from
each tile's GeoTransform, and non-georef tiles all carry the default
identity transform, so every DstRect landed at the origin and the
mosaic shrank to one tile. Reading the VRT back silently returned a
single tile's data with wrong dims and coords.

_write_vrt_tiled now records each tile's pixel offsets and passes them
through _build_vrt to write_vrt via an internal dst_offsets kwarg when
the input has no georeferencing. write_vrt refuses more than one
all-non-georef source without explicit placement, and rejects
dst_offsets alongside georeferenced sources (those place via their
GeoTransform). Georeferenced placement math is unchanged.

Also commits the metadata-sweep state row for the geotiff module.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Place non-georeferenced VRT tiles by explicit pixel offsets (#3116)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_vrt.py (dst_offsets docstring): the contract doesn't say what happens when explicit offsets overlap or leave gaps. The only production caller (_write_vrt_tiled) always produces a full non-overlapping cover, but a direct internal caller gets no validation and no documented behaviour. A one-line note ("offsets are not checked for overlap or full coverage") would pin that down.

Nits (optional improvements)

  • tests/vrt/test_non_georef_placement_3116.py:92open(vrt_path).read() leaks the file handle; pathlib.Path(vrt_path).read_text() avoids the ResourceWarning under stricter warning filters.
  • tests/vrt/test_non_georef_placement_3116.py:222 — the hash(str(bad)) & 0xffff filename suffix isn't needed: pytest gives each parametrized case its own tmp_path, so a fixed name can't collide. The hash also varies between runs (string hashing is salted), which makes failure artifacts harder to compare.

What looks good

  • The placement refactor is value-preserving for georeferenced sources: the per-source offset math moved out of the XML loop verbatim, and test_georef_multi_tile_placement_unchanged plus the existing 520-test VRT suite pin it.
  • The fail-closed gate for multiple non-georef sources without offsets matches the _check_no_mixed_georef precedent, and the error message names the recovery path.
  • dst_offsets is rejected alongside georeferenced sources instead of being silently ignored, so the two placement mechanisms can't disagree.
  • The bool/float/negative/arity rejections on offset pairs are each covered by a parametrized test, and the True-as-int case is handled explicitly.
  • Round-trip coverage spans all four reader backends, plus the dask-backed and plain-ndarray write branches, which share the fixed loop.
  • The XML-level test asserts the exact DstRect offsets and mosaic size, so a placement regression can't hide behind a reader that compensates.

Checklist

  • Algorithm matches reference (VRT pixel-space DstRect placement; verified against the emitted XML)
  • All implemented backends produce consistent results (4-backend round-trip test)
  • NaN handling is correct (not applicable; writer-side index placement, no pixel math)
  • Edge cases are covered by tests (single tile, ragged last tile row/column, plain ndarray, dask chunk grid)
  • Dask chunk boundaries handled correctly (offsets follow the chunk grid, not tile_size)
  • No premature materialization or unnecessary copies (metadata-only pass unchanged)
  • Benchmark exists or is not needed (writer bug fix; not needed)
  • README feature matrix updated (not applicable; no new function or backend change)
  • Docstrings present and accurate (modulo the overlap/gap note above)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after e0da0b1.

All three findings from the first pass are addressed:

  • dst_offsets docstring now states that offsets are not checked for overlap or coverage and that direct callers own their layout (fixed).
  • XML assertion test reads via pathlib.Path.read_text, no leaked handle (fixed).
  • The salted-hash filename suffix in the bad-pair test is gone; tmp_path already isolates parametrized cases (fixed).

Test suite re-run after the changes: 18/18 in test_non_georef_placement_3116.py, flake8 and isort clean. Nothing further from me.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 10, 2026
test_read_geotiff_dask_handles_vrt_directly builds a 2-source VRT from
non-georeferenced tiles, which the new placement guard now refuses
without dst_offsets. The test only verifies that _read_geotiff_dask
routes .vrt sources to the VRT reader, so spell out the side-by-side
layout its comment already described. Before the guard, this setup
produced the collapsed index #3116 fixed; the test passed only because
it asserts dims alone.
@brendancol

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

…eotiff-2026-06-09

# Conflicts:
#	.claude/sweep-metadata-state.csv

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Merged origin/main into this branch. The only conflict was in .claude/sweep-metadata-state.csv — both sides had modified the geotiff row; I kept the HEAD version (2026-06-09, #3116) and included the new mcda row that main added. Merge commit: d66a228.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants